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This  paper  has  two  main  themes.  It  tries  to  model  early  vision  processes 


in  terms  of  minimizing  energy  functions.  Secondly  it  examines  methods  of 


minimizing  these  functions,  in  particular  analog  style  networks. 


The  first  section  gives  some  background  on  the  use  of  energy  functions 


and  neural  networks  in  vision.  The  next  few  sections  describe  work  done 


by  the  author  and  collaborators  on  a  number  of  vision  problems.  The  final 


sections  discusses  limitations  to  this  approach. 


1.  Energy  Functions  and  Networks. 


It  is  convenient  to  divide  vision  up  into  two  stages.  In  the  first  stage  the 


visual  scene  is  analysed,  segmented,  and  properties  such  as  depth,  colour  and 


texture  are  extracted.  In  the  second  stage  objects  are  recognized  and  high 


level  information  is  used.  The  output  of  the  first  stage  is  a  representation  of 


the  scene  in  terms  of  depth  values,  colour  and  so  on.  This  representation  can 


be  called  a  2  —  1/2  D  sketch  (Marr  1982)  or  an  Intrinsic  Image  (Barrow  and 


Tennenbaum  1981).  It  is  generally  assumed  (Marr  1982,  Horn  19SG,  Ullman 


1979)  that  the  construction  of  such  a  representation  does  not  involve  any 


knowledge  of  the  world  (or  of  the  task  being  performed)  more  sophisticated 


than  low  level  assumptions,  such  as  the  rigidity  of  objects.  This  representation 


is  produced  by  a  number  of  independent  modules,  such  as  stereo,  structure 
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from  motion,  shape  from  shading.  This  paper  will  confine  itself  entirely  to 
the  modules  of  early  vision. 


A  number  of  these  modules  have  been  modelled  in  terms  of  energy  func¬ 
tions.  Ullman  (1979)  described  a  theory  of  motion  involving  solving  the  cor¬ 
respondence  problem  between  image  frames  by  minimizing  a  cost  function. 
Ikeuchi  and  Horn  (1981)  describe  a  theory  of  shape  from  shading  using  a 
variational  principle  and  Ikeuchi  (1980)  uses  a  similar  technique  for  shape 
from  texture.  Ikeuchi  (1980)  describes  how  this  technique  is  able  to  impose 
smoothness  constraints  on  the  object  and  draws  the  analogy  with  imposing 
constraints  in  Artificial  Intelligence.  These  methods  can  be  illustrated  by 
work  on  optical  flow  by  Horn  and  Schunk  (1981).  Let  the  brightness  function 
be  I(x  i ,  X2,  t).  Then,  assuming  that  points  with  the  same  image  intensity  over 
time  correspond,  the  velocity  field  (iq,^)  obeys 


dl  dl  n 

9^Vi+m=0 


where  we  use  the  summation  convention  over  repeated  indices  (for  example 
a,b,  =  a\b\  +  a2&2)-  Now  (1.1)  is  a  single  equation  for  the  two  unknowns 
(tq ,  V2)  and  does  not  specify  them  uniquely.  To  obtain  a  unique  solution  Horn 
and  Schunk  assume  continuity  of  the  velocity  field  and  minimize  a  function 
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f  dv  dv  [ ,  dl  dl,  2 

~  J  dr,dTi+  XJ  (Vid7,  +  di^  ' 


(1.2) 


d.r,  dx, 

Here  the  first  term  corresponds  to  requiring  smoothness  of  the  velocity 
field  and  the  second  to  enforcing  (1.1)  .  They  define  an  iterative  algorithm  to 
minimize  (1.2)  and  obtain  good  results. 

Many  other  vision  problems  have  been  treated  in  a  similar  way  and  we 
mention  a  few  examples.  Hildreth  (1984)  used  a  similar  method  to  solve  the 
aperture  problem  for  motion  based  on  zero  crosssing  contours.  Grimson  ( 1981 ) 
uses  a  similar  approach  to  interpolate  a  surface  through  sparse  stereo  data. 
Terzopoulos  (1984)  extended  Grimson’s  work  using  more  sophisticated  tech¬ 
niques.  Poggio  and  Torre  (1984)  descovered  the  similarity  of  these  methods 
to  a  branch  of  mathematics  called  regularization  theory  (Tikhonov  1977)  and 
proposed  a  unified  framework.  These  methods  all  had  an  important  property 
that  was  both  a  weakness  and  a  strength;  they  usually  imposed  continuous 
solutions  and  smoothed  over  discontinuities.  Regularization  theory  (Poggio 
and  Torre  1984)  indeed  required  that  the  solution  to  a  problem  depended 
smoothly  on  the  data.  Inserting  a  discontinuity  in  the  solution  would  require 
a  yes/no  decision,  and  hence  could  not  depend  continuously  on  the  data.  * 


# 


*  Terzopoulos  (1984)  suggested  the  surface  could  be  interpolated  smoothly  and 
then  the  boundaries  found  by  an  edge  detection  operation  measuring  the  “tension” 
in  the  smoothed  surface. 


This  inability  to  deal  with  discontinuities  however  had  important  practical  ad¬ 
vantages,  the  energy  functions  tended  to  be  convex  and  not  have  local  minima. 
Thus  they  could  be  minimized  by  simple  methods  such  as  gradient  descent. 
More  sophisticated  techniques  could  be  used  to  speed  up  the  convergence.  For 
example,  Terzopoulos  (1984)  adapted  a  multi-layer  algorithm  due  to  Brandt 
(1977). 

To  deal  with  discontinuities  a  new  approach  was  needed.  Geman  and  Ge- 
man  (1984)  did  work  on  image  segmentation  using  line  processors.  These  are 

f 

illustrated  in  figure  1.  There  are  two  lattices,  the  standard  space  lattice  and 
an  additional  line  processor  lattice.  The  line  processor  elements  are  either  on 
or  off.  When  a  line  processor  is  on  it  breaks  the  constraints  between  the  ad¬ 
jacent  space  pixels.  Similar  work  was  reported  by  Blake  (1983)  who  used  the 
idea  of  weak  constraints  *  ,  that  is  to  say  constraints  which  must  be  satisfied 
almost  everywhere  but  which  can  be  broken  at  a  cost.  The  binary  nature  of 
the  line  processors  means  that  discontinuities  can  be  dealt  with.  However  the 
energy  functions  are  no  longer  convex  and  new  strategies  are  needed  to  mini¬ 
mize  them.  Various  methods  have  been  tried.  Geman  and  Geman  (1984)  use 
simulated  annealing  while  Blake  (1983)  uses  a  method  which  systematically 
approximates  the  energy  function  by  a  convex  one,  graduated  non- convexity.. 

*  Based  on  work  by  Hinton  (1979). 
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Marroquin,  (1985)  Marroquin  ct  al  (1987)  interpret  the  energy  functions  in 
terms  of  probability  theory  using  the  Clifford  Hamersley  theorem,  a  connec¬ 
tion  described  in  Geman  and  Genian  ( 1984).  Instead  of  minimizing  the  energy 
function  they  use  probabilistic  algorithms  to  estimate  the  maximum  a  pos¬ 
teriori  Bayesian  estimate  of  the  solution.  Apart  from  simulated  annealing, 
which  takes  a  long  time,  none  of  these  methods  are  guaranteed  to  converge 
to  the  correct  result. 

In  this  paper  we  will  describe  an  alternative  approach  to  minimizing 
these  energy  functions  based  on  analog  networks  of  the  Hopfield  type.  Math¬ 
ematically  this  gives  a  method  of  smoothing  the  energy  function  reducing  the 
number  of  local  minima.  It  is  also  implemented  by  a  network  which  could  be 
built  in  V.L.S.I.  and  which  could  possibly  be  implemented  by  real  neurons. 
The  V.L.S.I.  network  would  be  massively  parallel  and  could  minimize  the  en¬ 
ergy  function  orders  of  magnitude  faster  than  serial,  or  parallel  computers. 
Hopfield  networks  were  originally  designed  to  be  an  associative  parallel  mem¬ 
ory  (1982,  1984).  In  Appendix  (1)  we  give  a  simple  introduction  to  Hopfield 
networks  and  then  show  how  their  formalism  can  be  extended  to  allow  some 
generalizations.  Although  the  Hopfield  networks  are  nonlinear  we  can  usu¬ 
ally  write  analytic  closed  forms  for  the  solutions  they  converge  to.  There  is 
empirical  evidence  that  they  often  converge  to  the  correct  result.  Moreover 


we  can  prove  mathematically  that  they  will  always  converge  to  a  solution  of 
the  mean  field  theory  equations  and  thus  represent  a  deterministic  method 
to  approach  the  probabilistic  solutions  (see  also  Marroquin  1987).  There  are 
also  similarities  with  the  graduated  non-convexity  approach  of  Blake  (1983). 


Many  vision  algorithms  were  designed  to  be  implemented  o<i  neuronally 
plausible  networks.  For  example  Horn’s  work  on  colour  (1973)  was  partially 
intended  as  a  possible  model  of  the  human  colour  system.  Cooperative  stereo 
algorithms  by  Arbib  and  Dev  (1975)  and  Marr  and  Poggio  (1977)  were  also 
implemented  by  simple  neural-like  elements.  *  Terzopoulos  (1984)  suggested 
the  use  of  analog  networks  for  surface  interpolation,  but  did  not  implement 
them.  Poggio  et  al  (1985)  developed  analog  networks  for  quadratic  regulariza¬ 
tion  energy  functions.  They  note  that,  supposing  we  are  considering  motion 
smoothing,  minimizing  the  energy  function  E(vi,dv,dx j)  given  by  (1.2)  is 
equivalent  to  solving  the  associated  Euler  Lagrange  equations  (Courant  and 
Hilbert  1953) 


*  Interestingly  the  Marr  and  Poggio  network  can  be  considered  as  of  discrete  Hop- 
field  network.  Discrete  Hopfield  networks,  however,  are  less  effective  than  con¬ 
tinuous  networks  for  minimizing  functions  since  the  continuous  networks  smooth 
the  energy  function  removing  local  minima. 
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dE  -  d  dE  (13) 
d(dc,/dxj)  dxj  dv, '  1  ‘  ; 

Since  E(v{,  dv,dxj )  is  quadratic  in  v,,  dv,dx }  the  equations  (1.3)  arc  linear 
in  v,  and  its  derivatives.  Any  system  of  linear  equations  can  be  modelled  by 
an  analog  network  involving  resistances,  capacitors  and  inductances  (Kaplus 
1958)  *  and  hence  (1.3)  can  be  solved  for  such  a  network. 

Hopfield  style  networks  could  also  be  adapted  to  minimize  (1.2)  and  it 
is  interesting  to  consider  the  differences  with  the  networks  described  above. 
Hopfield  networks  have  a  dynamical  update  rule,  which  for  this  problem  cor¬ 
responds  directly  to  a  continuous  form  of  steepest  descent, 


It  follows  from  the  chain  rule  of  differentiation  that  the  energy  function 
E(t)  will  continuously  decrease  with  time 


E  is  bounded  below,  by  0,  and  so  with  this  dynamics  the  system  has  to 
converge  to  a  minimum  of  E.  Thus  E  is  a  Lyaponov  function. 

The  analog  networks  of  Poggio  et  al  ( 19S5),  and  the  networks  of  Hopfield, 


* S | >e*  ial  tricks  are  needed  to  net  negative  resistors 


ryy.'y.sy.s. 


id 


can  all  be  constructed  out  of  simple  electronic  elements  (resistors,  operational 
amplifiers,  etc).  They  are  also  compatible  with  the  existing  knowledge  of  the 
electrical  behaviour  of  the  dendrites  and  axons  of  neurons.  Thus  neural  hard¬ 
ware  would  be  capable  of  implementing  such  networks,  although  the  evidence 
suggests  that  neurons  are  considerably  more  complicated. 


2.  Surface  Interpolation 

Surface  interpolation  is  a  good  example  for  illustrating  the  difference 
between  energy  functions  requiring  smoothness  and  those  allowing  disconti¬ 
nuities.  Following  the  work  of  Geman  and  Geman  for  image  restoration,  an 
energy  function  for  surface  interpolation  can  be  written  (Marroquin  1985),  * 


E(f,  I)  =  X)(/j  -  /,-+,)*(  1  -  It)  +  Cd  £(/,■  -  d,f  +  C,J2  It-  (2.1) 

t  t  t 

Here  the  d{  correspond  to  the  depth  data,  the  /,  to  the  desired  answer  and 
the  /,  to  the  line  process  elements.  Cd  and  C/  are  constants.  The  line  process 


*  For  simplicity  of  the  mathematics  we  write  the  energy  function  in  one  dimension. 
Ail  the  results  described  also  hold  for  two-dimensional  surfaces. 
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lattice  is  interposed  with  the  space  lattice,  see  figure  1.  The  /,  can  take  values 
0  or  1.  If  the  I,  are  all  set  equal  to  zero  then  E  reduces  to  standard  surface 
interpolation  with  a  membrane  *  .  This  formulation  allows  the  smoothness 
constraint,  enforced  by  the  first  term  of  (2.1),  to  be  broken  at  a  cost.  If  the 
surface  gradient,  f,  —  /l+1,  becomes  too  large  the  line  process  term  /,  can 
switch  on  at  a  cost  of  Ci  in  the  third  term.  See  figure  1. 

This  energy  function  can  be  generalized  to  two  dimensions  in  a  straight¬ 
forward  manner  (Marroquin  1985,  Blake  1983,  Koch.  Marroquin  and  Yuille 
19SG).  Additional  terms  can  be  added  to  the  energy  function  to  impose  con¬ 
tinuity  of  lines.  The  interpolation  can  be  generalized  from  membranes  to  thin 
plates  by  including  terms  with  second  order  derivatives.  This  will  require  in¬ 
troducing  surface  gradient  processors  corresponding  to  discontinuities  in  the 
surface  gradient. 

There  has  been  much  work  on  surface  interpolation  and  several  peo¬ 
ple  have  used  energy  functions  of  form  (2.1).  Different  strategies  were  used 
to  minimize  (2.1),  Blake  uses  a  technique  called  graduated  non-convexity  in 
which  a  non-convex  energy  function  is  gradually  transforned  into  a  convex 
one.  This  method  is  not  guaranteed  to  find  the  global  minima  of  the  original 
energy  function  and  may  only  work  for  dense  data,  nethertheless  it  gives  good 


I 


*  Membranes  and  thin  plates  correspond  to  minimizing  the  first  and  second  deriva¬ 
tives  of  a  surface  respectively. 


Figure  1  The  line  processors.  In  (a)  there  are  no  line  processes  and  the 
interpolation  process  assumes  that  the  points  lie  on  a  single  surface.  In 
(b)  the  line  processes  switch  on  and  break  the  surface  where  the  gradient 
is  large. 


empirical  results.  Marroquin  used  simulated  annealing  and  a  number  of  other 
statistical  and  deterministic  algorithms.  Again  these  were  not  guaranteed  to 
always  succeed  but  give  good  results. 


Since  there  are  no  guaranteed  methods  to  find  the  minima  of  (2.1)  we 


would  like  a  technique  that  may  not  always  suceed  but  which  is  very  fast.  In 
a  series  of  papers  Hopfield  (19S2,  1984,  1985)  describes  networks  made  up  of 
simple  analog  devices.  These  networks  have  two  important  properties.  Firstly 
they  could  be  implemented  in  V.L.S.I.  and  perform  parallel  computations  at 
speeds  orders  of  magnitude  faster  than  existing  parallel  (or  serial)  computers. 
Secondly  the  networks  may  be  biologically  plausible. 

From  our  viewpoint  these  networks  have  a  third  advantage;  they  are  able 
to  find  good  estimates  of  the  global  minima  of  non-convex  energy  functions. 
Hopfield  and  Tank  (1985)  demonstrated  that  they  could  find  close  solutions 
to  the  Travelling  salesman  problem  (to  within  a  few  percent  of  the  optimum 
length)  for  up  to  thirty  cities.  The  work  described  below  was  first  reported 
in  Koch,  Marroquin,  Yuille  (1986).  We  now  show  how  to  design  a  network  to 
minimize  (2.1). 


The  key  point  of  the  Hopfield  approach  is  to  replace  the  binary  variables 
/,  by  continuous  variables  V,  lying  in  the  region  [0, 1].  Each  V)  is  related  to  an 
internal  variable  (7;,  which  is  unbounded,  by  a  gain  function  g,  V{  =  g(Ui).  A 
typical  choice  for  g  is 
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This  is  a  sigmoid  function  monotonieally  increasing  from  0  as  u,  •— k  —  oo 
to  1  as  Ui  t— ►  oo.  The  parameter  A  controls  its  “sharpness”.  As  A  m  oo  it 
becomes  a  Heaviside  (step-edge)  function.  The  energy  function  can  now  be 


written 


^(/T)  =  ^(/i-/i+1)2(i-v;)+c^(/,-dl)2+c1^v;-i-c,^  g-'{V)dv 


where  the  last  term  is  a  gain  function  term.  The  dynamical  equations  are 


dUj  _  dE 
dt  ~  dVt 


dfi_  =  _dE_ 
dt  df, 


(2.4a) 


(2-44) 


Observe  that  since  Ui  is  related  to  V,  by  the  gain  function  (2.2)  the  V’ 
will  always  lie  in  (0,  lj.  This  type  of  dynamics  is  gradient  descent  for  the  /,. 
If  we  substitute  for  Ui  in  (2.4a)  it  becomes 


and  so  is  a  form  of  gradient  descent  for  the  Vi  with  a  weight  factor.  It  can 
easily  be  checked  that  E  is  a  Lyaponov  function  for  the  system 
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DoHj  -,  2  /  dE  ,  2  dV, 

Af"'  ~  Zw^  at7/  ^77“- 


dt  ^v5/i  ^W,'  dUi 


Since  is  a  monotonic  function  of  Ui  the  right  hand  side  of  (2.6)  is 


always  negative.  Thus  E  always  decreases  and  is  bounded  below,  so  the 


system  coverges  to  a  minima. 


Koch,  Marroquin  and  Yuille  (1986)  describe  the  implementation  details. 


The  system  was  simulated  on  a  Symbolics  3600  LISP  machine.  The  system 


was  tested  first  in  1-D  and  then  in  2-D.  Some  experimentation  was  needed 


to  find  suitable  values  for  the  parameters  Cd,Ci,Cg.  The  system  performed 


well  even  with  noisy  data  and  with  sparse  sampling  (sometimes  as  low  as  five 


percent  of  the  points  were  sampled). 


The  A  parameter  controls  the  degree  of  smoothing  of  the  gain  function. 


For  high  A  the  V)  are  essentially  forced  to  be  either  0  or  1  and  the  continuous 


system  (2.3)  is  close  to  the  discrete  system  (2.1).  For  small  A  the  energy  func¬ 


tion  becomes  convex  (this  can  be  verified  by  calculating  the  Hessian  of  (2.2)). 


Thus  A  corresponds  to  the  degree  of  smoothing  of  the  problem.  Altering  the 


value  of  A  to  change  the  degree  of  convexity  of  the  energy  funct  ion  is  analogous 


to  graduated  non-convexity  (Blake  1983).  For  our  simulations  the  only  differ¬ 


ence  in  running  the  networks  with  small  or  large  A  was  the  convergence  time. 


The  smaller  A,  the  longer  it  took  to  converge,  without  any  significant  effect 
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on  the  final  solution.  The  convergence  time  of  the  networks  was  usually  a  few 


time  constants  (using  the  adiabatic  expansion  it  can  be  shown  that  networks 


obeying  equations  (2.4)  reach  a  final  state  after  a  few  time  constants). 


The  fact  that  the  network  converges  even  with  high  A  suggests  a  hybrid 


strategy  to  minimize  (2.1).  The  fi  are  continuous  and  are  updated  by  gra¬ 


dient  descent  while  the  V)  are  discrete.  The  Vi  are  sampled  at  random  and 


changed  if  this  reduces  the  energy.  Some  experimental  success  is  reported 


with  this  (Koch,  Marroquin  and  Yuille  1986).  For  further  work  see  Hutchin¬ 


son  and  Koch  (1986),  Marroquin  (1985),  Marroquin  et  al  (1987).  Clearly  such 


a  strategy  will  only  work  if  the  energy  function  (2.1)  has  few  local  minima. 


We  now  perform  a  new  analysis  of  the  network  and  prove  results  showing 


that  it  converges  to  a  solution  of  the  mean  field  theory  equations.  If  we  use 


a  probabilistic  algorithm,  like  the  Metropolis  algorithm,  in  the  final  state  of 


the  system  each  line  process  /,  will  be  on  with  a  certain  probability  pi(T), 


where  T  is  the  temperature.  The  network  will  converge  to  a  state  where  the 


line  process  elements  take  deterministic  values  Pi{T)  where  T  is  inversely  pro¬ 


portional  to  A.  It  is  in  this  sense  that  the  network  is  a  solution  of  the  mean 


field  theory  equations.  Note  that,  because  of  the  coupling  with  the  depth 


field  fi ,  there  will  be  several  solutions  to  the  mean  field  theory  equations  and 


we  cannot  guaurantee  that  we  will  find  the  one  with  least  energy.  Another 


deterministic  method,  based  on  probabilistic  considerations,  of  finding  solu¬ 
tions  of  the  mean  field  theory  equations  for  this  problem  has  been  proposed 
by  Marroquin  (1987). 

Using  (2.6)  and  substituting  for  V  we  see  that  the  energy  function  E 


decreases  at  the  rate 


dE  ^—*,dEs/i  /  dE  » 

~dT  =  ~^dT>  ~ 


dE^idUj 
dUi ’  dVi' 


The  energy  is  bounded  below  so  the  system  converges  to  a  state  with 
dE/dt  =  0.  From  (2.7)  we  see  that  this  implies  (note  that  dUi/dVi  is  always 
positive  and  non-zero) 


dE  n 

W.=0’ 


(2.8  a) 


Note  that 


We  calculate 


dVi  __  2X 

dUi  ~  (eXUi  4-  e~XUi  )2 


(2.86) 


dUi  (exv‘ +  eXUi  )2  ^  ^'+1  ^  ^  +  2ACjU,). 


(2.10) 


This  function  has  zeros  at  CgUi  =  (/,+i  —  fi Y  —  Ct  and  at  ±oo.  Cal¬ 
culating  the  second  derivatives  of  E  with  respect  to  Ui  we  see  that  the  zeros 
at  ±inf  are  maxima  and  the  zero  at  Csf7,  =  (/<+i  —  /,)2  —  Ct  is  a  minimum. 
Thus  although  E  is  not  convex  with  respect  to  Ux  it  has  a  unique  minimum. 
Note  that  none  of  the  V,  will  be  exactly  0  or  1.  The  true  energy  minima  will 


therefore  have 


CgUi  =  (/,•+,  -  /,)2  -  Cl. 


(2.11) 


Recall  that,  for  large  A,  the  sign  of  U,  determines  whether  Vi  =  0  or  1. 
Thus  a  discontinuity  will  be  imposed  only  if 


(/<+. -/<)2>c,. 


(2.12) 


The  depth  terms  /,  will  obey 


(fi  ~  /<+ 1)(1  -  Vi)  +  (/,  -  /,_!)(!  -  Vi-0  +  (/.  -  di)  =  0. 


We  rewrite  (2.11)  in  terms  of  /,  as 


(2.13) 


1  4-  exp-2\((fi+\  —  fi)2  -  Ci)/Cg 


(2.14) 


We  have  shown  that  if  the  interpolated  surface  gradient  exceeds  a  thresh¬ 


old  then  a  discontinuity  will  be  inserted.  It  is  not  clear  however  that  the 
discontinuity  will  be  inserted  at  exactly  the  correct  place.  Observe,  however, 
that  in  the  absence  of  depth  data  there  is  no  clear  criterion  for  where  the  edge 
should  be. 

A  stochastic  algorithm  (for  example  Marroquin  1985)  would  use  gradient 
descent  for  the  /;,  as  in  (2.4a),  and  a  probabilistic  update  rule  for  the  line  pro¬ 
cessors.  The  line  processors  are  examined  seperately  and  updated  as  follows: 
(i)  Calculate  the  change  in  energy  A£(/,)  resulting  from  changing  the  state 
of  the  line  process  /,.  (ii)  If  A E(l{)  <  0  make  the  change,  (iii)  If  A £'(/,)  >  0 
make  the  change  with  probability  1/(1  -f  exp—2AE/T). 

This  system  will  converge  to  a  state  of  thermal  equilibrium  with 

P(h  =  0)  =  Acxpi-ifi. n  -  (2.15a) 

p(l,  =  1)  =  Aexpi-Ct/T),  (2.15  b) 

where  A  is  a  normalization  factor  to  ensure  p(l,  =  0)  +  p(I,  —  1 )  =  1.  Thus 


lAU  =  1) 


1  +  exp-((f,+  i  -  f,)2  -  c,)/T 


MG) 


Comparing  (2.16)  with  (2.14)  we  see  that  the  network  indeed  finds  a 


solution  satisfying  the  mean  field  theory  equations. 


3.  Motion  Smoothing  and  Segmentation 

We  now  briefly  describe  some  work  on  motion  smoothing  and  segmenta¬ 
tion.  Tliis  work  was  done  in  collaboration  with  C.  Koch.  Mathematically  it 
is  very  similar  to  the  work  on  surface  interpolation. 

The  problem  we  are  tackling  involves  segmenting  an  image  using  motion 
flow  (with  correspondence  based  on  image  intensity).  For  example,  detecting 
an  object  moving  over  a  textured  background. 

A  method  for  obtaining  the  motion  field  was  described  in  section  1.  This 
assumes  continuity  of  the  motion  field  and  therefore  would  break  down  at  the 
boundaries  of  the  object.  It  is  straightforward,  however,  to  modify  the  energy 
function  to  include  line  processes.  The  line  processes  should  switch  on  at  the 
boundaries  of  the  object  thereby  both  segmenting  the  scene  and  preventing 
the  velocity  field  being  distorted  by  smoothing  over  the  boundaries.  Inside 


these  boundaries  the  energy  function  should  give  the  velocity  field  as  before. 
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The  energy  function  can  be  written  as 


EKyvl},ht^v,})  =  £«>(  Ii+lJ  -  +  vf,j(Ii<j+l  -  4i))2 


+ E  ((”*+■  j  -  <i>’ + <■>?+..>  -  -  »u>  +  («,+.  -  </ 


+K>+,  -  <>)a)(l  -  »i.,)  +  E  ^  (3-D 

The  horizontal  and  vertical  line  processors  are  given  by  hij  and  vt j 
respectively.  The  V(h,v)  term  corresponds  to  the  cost  for  the  vertical  and 


horizontal  line  processes  including  the  terms  enforcing  local  continuity. 


This  energy  function  is  very  similar  to  that  for  surface  interpolation. 
Once  again  the  line  processes  produce  local  minima  in  the  energy  function 


and  no  algorithms  are  guaranteed  to  converge. 


It  is  straightforward  to  design  a  Hopfiekl  network  to  minimize  (3.1).  The 


h,  j  and  v, }  become  continuous  variables  related  to  new  variables  p,  j  and  qij 
by  h,  j  —  g(pij)  and  vtJ  —  g(q,t] )  respectively.  Gain  function  terms  for  h 


and  v  are  added  to  the  energy.  The  dynamics  are  defined  as  in  the  previous 


section 


dvx 

dt 


dE 


dv1 


(3.2a) 


Figure  2  An  example  of  an  object  moving  relative  to  a  fixed  background 
This  network  is  still  being  tested  but  preliminary  results  are  encouraging.  It 


is  able  to  segment  a  textured  object  moving  over  a  textured  background  field. 
It  can  be  mathematically  analysed,  as  for  surface  interpolation,  and  similar 
results  hold. 
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4  Motion  correspondence 


We  now  consider  a  rather  different  type  of  vision  problem  namely  motion 
correspondence  and  the  recovery  of  structure.  This  work  is  described  in  detail 
in  Grzywacz  and  Yuille  (19SC). 


Ullman  (1979)  proposed  dividing  the  structure  from  motion  process  up 
into  two  different  stages.  The  first  consists  of  matching  tokens,  such  as  points 
or  straight  lines,  between  different  image  frames;  solving  the  so  called  cor¬ 
respondence  problem.  Once  this  matching  is  done  the  second  stage  assumes 
rigidity  to  recover  the  structure  of  the  object.  Ullmanl984  later  suggested  an 
alternative  method  of  finding  the  structure  (see  also  Grzywacz  and  Hildreth 
1987)  capable  of  dealing  with  non-rigid  motion.  He  again  used  psychophysics 
to  argue  that  the  three  dimensional  structure  of  an  object  was  not  perceived 
immeadiately  but  developed  gradually  over  time.  He  proposed  that  the  vi¬ 
sual  system  constructed  an  internal  model  of  the  object,  initially  flat,  which 
was  updated  over  time  by  assuming  the  minimal  change  of  rigidity  between 
sucessive  image  frames,  the  so  called  incremental  rigidity  scheme. 


It  is  natural  to  ask  whether  errors  are  caused  by  dividing  the  process 
into  two  stages.  Both  are  solved  using  different  assumptions  and  it  is  possible 
that  these  conflict  for  some  stimuli.  It  is  also  interesting  to  see  if  rigidity 
alone  is  sufficient  to  solve  the  correspondence  problem.  To  investigate  this 
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we  define  a  cost  function  that  minimizes  incremental  rigidity  and  solves  the 
correspondence  problem  simultaneously. 

The  incremental  rigidity  scheme  maintains  a  model  of  the  object  at  any 
given  time  M(t)  =  (xj(t),  y,(f),  Zi(t))  ,i  =  1  At  the  next  timeframe,  at 

t  +  St,  the  model  is  updated  taking  into  account  the  new  information  available 
(we  are  assuming  orthographic  projection  onto  the  x,y  plane).  The  model  is 
updated  to  M(t  +  6t )  =  ((x,-(<  +  St),  y^t  +  St),  Zi(t  +  St)))  ,  i  =  1, N  where 
Zi(t+£t)  is  determined  by  minimizing  the  change  in  rigidity  between  M(t)  and 
M(t  +  St).  The  object  is  initially  assumed  flat,  i.e.  M(0)  =  ((x;(0),  j/i(0),  0)) 
i  =  1, ...,  N . 

We  first  investigate  using  rigidity  to  solve  the  correspondence  problem 
and  determine  the  structure  simultaneously.  A  measure  of  rigidity  is 


=  Mt)  -  xj(t))2  +  (y,-(0  -  y}{t))2  +  ( Zi(t )  -  Zj{t))2.  (4.1) 

The  Zi(t  +  <$*)  are  defined  to  minimize  the  change  in  rigidity  A R  between 
frames 


N 


=  ^((I,;(<)  -  L,j{t  +  St)))2.  (4.2) 

We  now  define  a  set  of  binary  correspondence  variables  (V,a).  If  point  i 
in  the  first  frame  goes  to  point  a  in  the  second  frame  then  Vta  =  1,  otherwise 


>  ’j 


\  V 
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\  la  —  0.  We  can  define  a  matching  cost  Er  by 

A' 

Er  =  XI  ((M*)  ~  +  *0) )2KaVj6-  (4.3) 

I  ,j,a,fc 

To  find  the  correspondence  and  structure  we  minimize  Er  with  respect  to 
(za{t  +  6t))  and  (V,a)  requiring  that  all  points  in  the  first  frame  aie  matched 
to  exactly  one  point  in  the  second. 

We  use  a  method  developed  by  Hopfield  and  Tank  (1985)  for  the  Travel¬ 
ling  Salesman  Problem.  Again  we  first  define  a  new  array  of  variables,  (lqa). 
These,  are  internal  variables  of  the  new  problem  and  have  a  monotonically 
increasing  relationship  to  Via : 


V,a  = 


1 


1  -(-  e  2,W'« 


(4.4) 


where  A  is  again  a  parameter  of  the  problem.  We  next  define  the  full  energy 
function  to  be: 


,  N  .V  N  n  N  N  N 

+  f  £  £  £  r-'' 


i b 


a  =  1  i  =  j  ;  - 1 
;  #•' 


i=l  a  =  1  =  i 

ft  ^  a 


•V  A' 


+f<l££r'»--v)>2  +  f£«  <4-5) 

i=i  «=i 


A'  A' 


+2xEE((V-lofi(V*a)  +  ((1  -  1/'«))1°g((1  -  v;«)))), 

1=1  a  =  l 

where  A,B,C,D,F  are  positive  parameters  of  the  problem.  (We  will  infor¬ 


mally  identify  each  of  the  terms  of  the  right  hand  side  of  Eq.  4.5  by  the 


parameter  leading  it,).  Minimization  of  the  A  term  forces  each  feature  in  the 
second  frame  to  maintain  correspondence  with  as  few  features  as  possible  in 
the  first  frame,  (and  vice  versa  for  the  B  term).  Minimization  of  the  C  term, 
forces  the  total  amount  of  correspondences  to  be  N .  Thus  the  terms  A  and 
C  will  force  N  correspondences  of  strength  1  to  be  established  in  such  a  way 
that  each  feature  of  the  first  frame  will  tend  to  have  a  correspondence  with 
a  feature  in  the  second  frame  and  so  that  the  correspondences  will  be  evenly 
distributed  among  the  features  of  the  second  frame.  It  follows  that  the  process 
will  tend  not  to  leave  any  feature  unmatched. 

The  F  term  is  necessary  to  give  a  time  constant  for  convergence  of  the 
network.  We  define  the  usual  update  laws 

1-'-N ■  l-aSN’  (4-6) 

Provided  that  A  is  large  enough  the  variables  Via  will  tend  to  be  either  0 
or  1  and  thus  it  will  tend  to  force  a  binary  decision  to  determine  whether  a 
correspondence  is  to  be  established  or  not. 

We  simulated  this  network  on  a  Symbolics  3600  LISP  machine.  The 
results  are  described  in  detail  in  Grzywacz  and  Yuillc  (19SC).  To  summa¬ 
rize  them:  despite  extensive  experimentation  with  the  parameters  the  sys¬ 
tem  rarely  converged  to  the  correct  answer  unless  given  a  hint  of  the  correct 
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matches.  The  system  made  some  interesting  mistakes,  it  would  sometimes 
choose  matches  which  were  almost  rigid  but  which  corresponded  to  compli¬ 
cated  motion  of  the  object  between  different  frames.  This  suggested  that 
rigidity  alone  was  not  a  strong  enough  constraint  and  we  should  introduce 
another  term  in  the  energy  function  corresponding  to  smoothness  of  motion 
between  frames.  After  some  experimentation  we  fell  back  on  the  energy  func¬ 
tion  Emu  used  by  Ullman  (1979)  to  solve  the  correspondence  problem. 


Emm  =  Y1Y2  V'2*<%a-  (4-7) 

“  i=l  a=l 

When  we  added  the  Emm  term  to  the  energy  E  the  network  gave  con¬ 
sistently  good  results  for  a  wide  range  of  data  Grzywacz  and  Yuille  (1986).  It 
gave  a  high  percent  of  correct  matches  for  systems  of  up  to  thirty  points. 

To  see  what  contribution  the  rigidity  term  made  to  the  matching  we 
removed  it  and  ran  the  system  using  the  minimal  mapping  term.  The  system 
gave  identical  results  suggesting  that  the  rigidity  term  was  usually  uneccessary 
for  matching.  A  possible  exception  is  at  the  occluding  boundary  of  an  object. 
Here  the  order  of  points  can  reverse  between  frames  and  the  minimal  mapping 
scheme  gave  incorrect  results.  For  small  angles  and  for  some  values  of  the 
parameters  the  rigidity  term  obtained  the  correct  matching. 


In  fact  it  is  easy  to  see  that,  with  the  correct  matching,  the  minimal 
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mapping  energy  will  be  zero  for  rigid  motion  in  a  straight  line.  This  supports 
the  idea  that  rigidity  may  be  important  for  correspondence  for  rotational 
motions.  We  plan  psychophysical  experiments  to  investigate  this  case. 

In  our  simulations  using  the  Emm  term  we  did  not  try  to  optimize  the 
parameters  .4,  B.C,  F,  M  and  A  in  any  sense.  Instead  we  found  that  the 
asymptotic  beha\  ior  of  the  system  was  the  same  for  a  large  range  of  parameter 
values  (few  orders  of  magnitude).  Typical  values  used  during  the  course  of 
this  research  were  A  =  B  =  50000.  C  =  500000,  F  =  1,  A/  =  50  and  A  =  1, 
where  the  distances  between  features  in  a  given  frame  ranged  from  1  to  10. 
We  used  homogeneous  initial  conditions  for  our  simulations,  i.e. : 


Via(t  =  0)  = 


AT 


(4.8) 


We  also  tested  our  network  on  simple  situations  such  as  dot  splitting, 
when  there  are  two  dots  in  the  second  frame  equidistant  from  a  dot  in  the 
first.  Interestingly  our  network  gave  the  correct  psychophysical  result;  the 
dots  split  into  two  with  Vai  =  1/2.  This  suggests  psychophysical  experiments 
comparing  the  predictions  of  the  networks  to  that  of  observers  for  other  simple 
stimuli. 

There  is  an  important  practical  advantage  for  V.L.S.I.  circuits  in  only 
using  the  minimal  mapping  term  for  matching.  It  can  be  shown  (Grzywacz 


and  Yuille  19S6)  that  the  MM  term  can  be  clioosen  to  be  linear  in  the  matching 


term,i.e. 


-"EE*'-*- 


t=l  a  =  l 


This  gives  an  energy  fnnetion 


.  A’  N  K  N  N  JV 

*  =  4EEE'^.  +  fEEE'^> 


0  =  1  !=1  J=  1 
>*< 


1=1  0  =  1  i  =  l 


4((EE1-»-Ar))2  +  £^ 


(4.10) 


1=1  0  =  1 


N  N 


+frEE««.  log(Vi.)  +  ((1  —  Via))  log((l  —  Tio)  ))  ), 


1=  1  a  =  l 


This  means  that  the  dia's  do  not  affect  the  connection  strengths  between 
elements.  Thus  the  connection  strengths  will  not  have  to  be  changed  with 
each  time  frame. 

The  energy  function  described  above  is  rather  different  from  those  in  the 
previous  two  sections.  It  looks  considerably  more  complicated  and  a  lot  more 
care  is  needed  to  find  the  correct  parameters.  Unlike  the  previous  problems  the 
value  of  A  is  important,  it  has  to  be  small  compared  to  the  other  parameters 
of  the  problem.  Finally  the  similarity  of  the  energy  function  to  that  used  by 
Hopfield  and  Tank  for  the  Travelling  Salesman  Problem  suggests  it  contains 
many  local  minima.  Given  all  this,  the  success  of  Hopfield  style  networks 
for  this  problem  is  encouraging.  Hopfield  networks  give  progressively  worse 


28 


results  for  T.S.P.’s  with  more  than  thirty  cities.  We  can  prove,  however, 
(Grzywacz  and  Yuille  1986)  that  our  network  will  always  yield  the  correct 
answer  if  the  motion  between  frames  is  sufficiently  small  compared  to  the 
seperation  of  the  dots.  Thus  if  time  between  frames  is  small  enough  the 
correct  matching  will  be  made.  Other  ways  of  dealing  with  large  numbers  of 
points  will  be  discussed  in  section  6. 

As  for  surface  interpolation  we  can  derive  an  analytic  expression  for  the 
solution.  Using  the  chain  rule  for  differentiation  we  find 


S-^dUia  dE  3E 
^  dVia  dUia  &Uia 


(4.11) 


From  (4.4)  we  calculate 


dUia  cosh2\Uia 


(4.12) 


This  term  is  always  positive  definite  as  it  is  bounded  below  by  1/2A.  The 
energy  is  bounded  below  and  so  the  system  reaches  a  final  state  with  dE/dt  = 
0.  Using  (4.11)  and  (4.12)  we  see  that  a  necessary  and  sufficient  condition  for 


such  a  state  is 


(4.13) 


Inverting  (4.4) 


Differentiating  (4.10) 


=  mr  (A(Vf 1 OL  +  VaROW-2Via)+C(V-n)+<tlek  +  FUia).  (4.15) 

Here  we  have  introduced  new  notation.  V  =  ^ia'  VtCOL  =  ^2a  V'ia 
and  yaROlv  =  ^2 j  Via .  We  have  also  absorbed  the  constant  M/2  into  the 
definition  of  d{a. 

Observe  that  (4.15)  vanishes  at  Uia  =  ±oo.  But  we  can  show  that  these 
roots  do  not  correspond  to  minima  and  hence  are  not  solutions.  To  prove  this 
we  must  show  that  the  Hessian  of  E  with  respect  to  the  Uta  is  not  positive 
definite  there.  We  calculate 


d2E 

dU2a 


2\ 

COsh2\Uia 


(( 


4A 

COsh2\Uia 


+  F) 


—  2XtanhXUia  (A(ViCOL  +  VaROW  -  2Vta)  +  C(V  -  n)  +  d2ck  +  FU,a )).  (4.16) 

For  large  U{a  the  dominant  term  inside  the  bracket  is  —2XFU iatan}> XU ta. 
Thus  d2E/dUfa  will  tend  to  zero  from  below  as  Uta  *— ►  ±oo.  Hence  the  Hessian 
cannot  be  positive  definite  there  and  the  roots  at  infinity  are  not  minima. 
Now  consider  the  other  roots  of  (4.15).  These  obey 


I 


A(VtCOL  +  VaROW  -  2Via)  +  C(V  -  n)  +  <&  +  FUta  =  0  (4.17) 


and  are  also  roots  of  dE/dVia  =  0.  At  such  points 


d2E  _  d2E  dVta  dVjb 
dUiadlljb  ~  dVtadV]b  dUta  dU)b 


and  so  the  Hessian  of  E  with  respect  to  Uta  will  be  positive  definite  if  and 
only  if  the  Hessian  of  E  with  respect  to  Vja  is  positive  definite.  We  calculate 


d2E  F 

dViadVia  ~  +  Ve*(l  -  Vck) 


(4.19a) 


d2E 

dViadVib 


=  A  +  2C 


(4.19b) 


d2E  *  ^ 
dViadVja  ~  +  C 


(4.19c) 


mravT, =c- 

which  is  positive  definite. 

Thus  the  system  will  converge  to  a  minima  of  the  energy  function  which 
are  given  by 


r 


3 


Unlike  the  surface  interpolation  case  we  have  found  no  simple  interpre¬ 
tation  of  these  equations  in  general,  although  results  can  be  obtained  for 
particular  cases  (Grzvwacz  and  Yuille  1986).  Again  notice  that  the  final  V',,, 
must  lie  between  the  limits  0  and  1. 


5  Stereo. 

A  natural  way  of  writing  stereo  in  the  form  of  an  energy  function  is  as 
follows  (Barnard  1986,  Horn  1986,  Gennert  1987) 


E{d)  =  ~  R'+d(x))2  +f*^2(d(i  +  l)-d(i))2.  (5.1) 

I  I 

Here  d(i)  is  the  disparity  between  the  images.  £,  and  R,  are  measures  of 
the  left  and  right  images  and  can  be  continuous  or  discrete.  For  example  they 
could  be  the  image  intensities,  or  they  could  be  the  positions  of  zero  crossings. 
The  first  term  in  (5.1)  matches  the  left  and  right  images  in  such  a  way  as 
to  minimize  the  disparity  gradient  represented  by  the  second  term.  Line 
processors  can  be  introduced  to  prevent  the  disparity  gradients  from  becoming 


too  large  and  allows  distinct  objects  to  influence  each  others  matching.  *.  This 


yields 


E(d,  l)  =  -  R,+d(l])2 (1  -  /.)  +  /<£>(*  +  1)  -  d(*))2(l  -  1 •)  +  l" 

t  l  I 

(5.2) 

The  /,  performs  two  functions  in  (5.2).  It  prevents  the  disparity  gradient 
from  becoming  too  large  but  it  also  prevents  matching  in  the  first  term  if  the 
difference  between  the  images  is  too  great.  This  latter  effect  may  help  deal 
with  occluding  situations  when  one  eye  sees  a  region  which  the  other  cannot. 
It  is  simple  to  generalize  this  energy  function  to  two  dimensions. 

It  is  straightforward  to  design  a  Hopfield  net  for  this  problem  in  the  usual 
manner.  We  simulated  this  on  two  types  of  examples  (in  two  dimensions).  The 
first  consisted  on  a  sine  wave  with  the  central  square  displaced  and  the  second 
was  a  standard  random  dot  stereogram.  The  results  were  disappointing  in 
both  cases.  Although  it  was  possible  to  get  roughly  the  right  answer  for  the 
sine  wave  the  random  dot  pattern  gave  consistently  poor  results.  This  occured 
despite  filtering  the  images  with  a  variety  of  filters,  gaussians  and  difference 
of  gaussians  at  various  scales. 

We  believe  that  this  bad  behaviour  is  due  to  the  complicated  structure 
*  This  work  was  done  in  collaboration  with  T.  Poggio 
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of  the  energy  as  a  function  of  d.  The  behaviour  of  this  function  is  crucially 
dependent  on  the  structure  of  R.  For  example  if  R  is  a  linear  function  of  its 
arguments  the  energy  will  be  quadratic  in  d  and  therefore  well  behaved.  If  R 
is  more  complex,  in  particular  if  it  arises  from  a  random  dot  stereogram,  the 
energy  function  can  be  a  complicated  function  of  d  with  many  local  minima. 
To  apply  this  method  to  a  general  scene  would  require  smoothing  the  images 
(by  an  amount  determined  by  pre-processing)  until  the  function  R  was  suffi¬ 
ciently  well  behaved,  avoiding  smoothing  the  image  too  much  to  destroy  its 
interesting  features.  While  this  is  conceivable  it  seems  that  alternative  stereo 
algorithms  are  likely  to  be  more  sucessful. 

A  possible  approach  is  to  attempt  to  minimize  (5.2)  using  other  algo¬ 
rithms.  Some  success  (Barnard  19SG)  has  been  reported  for  using  simulated 
annealing  for  (5.1)  (in  two  dimensions).  So  far  attempts  to  use  simulated 
annealing,  and  other  stochastic  methods,  to  minimze  (5.2)  have  not  been 
sucessful.  Although  it  has  been  possible  to  hand  tune  the  parameters  to  got 
the  correct  result  for  one  class  of  stimuli,  sign  waves  with  displaced  centres, 
it  will  not  work  on  others,  such  as  random  dots.  Various  refinements  of  the 
methods  have  been  tried  including  a  coarse  to  fine  strategy  of  convolving  the 
image  with  a  large  scale  gaussian,  minimizing  the  energy  at  this  scale  and 
using  this  to  guide  the  matching  at  smaller  scales. 


This  result  is  negative  and  by  no  means  conclusive.  More  sophisticated 


algorithms  could  be  tried  including  perhaps  methods  of  estimating  the  pa¬ 
rameters  of  the  energy  function  directly  from  the  image.  Alternative  energy 
functions  could  be  tried.  We  believe,  however,  that  this  may  represent  a  limit 
for  the  practical  use  of  energy  functions.  If  an  energy  function  is  too  compli¬ 
cated  for  straightforward  algorithms  to  solve  then  the  problem  is  badly  posed 
and  more  heuristic  methods  should  be  used.  This  will  be  discussed  further  in 
the  next  section. 


6  Limitations  of  the  energy  function  approach. 

The  previous  four  sections  described  attempts  at  modelling  problems  in 
terms  of  energy  functions  using  analog  nets,  The  first  three  attempts  were 
reasonably  sucessful  and  the  last  one  failed.  We  argued  that  this  reflected  the 
relative  complexities  of  the  energy  functions  being  minimized. 

It  is  clearly  possible  to  write  any  vision  problem  in  terms  of  minimizing  an 
energy  function  *  .  It  is  less  clear  that  this  is  a  good  strategy.  For  non-trivial 

*  In  the  same  way  that  all  the  laws  of  physics  can  be  summarized  in  one  equation 


problems  it  often  leads  to  energy  functions  with  many  local  minima  depending 
on  large  numbers  of  parameters.  These  parameters  will  often  depend  on  the' 
image  being  viewed  and  might  even  have  different  values  in  different  regions 
of  the  image  *  .  As  yet  there  is  no  reliable  way  to  estimate  these  parameters, 
or  of  minimizing  energy  functions  with  many  local  minima. 

Energy  function  methods  seem  a  natural  idea  for  matching  representa¬ 
tions  of  images  containing  a  large  number  of  similar  primitives.  Random  Dot 
stereograms  (introduced  by  Julesz  1971)  are  ideally  suited  for  this  type  of 
algorithm.  Realistic  images,  however,  contain  many  different  features  of  vary¬ 
ing  sizes.  A  good  strategy  for  stereo  could  involve  matching  the  most  salient 
features  and  using  this  to  guide  the  ambiguous  features.  The  work  on  stereo 
by  Mitchison  and  McKee  (1987)  shows  that  for  one-dimensional  sterograms 
the  positions  of  the  endpoints  have  an  important  effect  on  the  matching  of 
the  interior  points.  Another  interesting  example  of  this  type  is  psychophysics 
for  motion  correspondence  illustated  by  Ramachandran’s  analogy  of  a  mov¬ 
ing  leopard  (Ramachandran  1985,  Ramachandran  and  Anstis  1983).  If  the 
outline  of  the  leopard  is  not  visible  the  leopard’s  spots  are  matched  to  nearby 
neighbours  and  no  motion  is  seen.  If  the  outline  is  also  visible  then  its  mo- 
by  summing  the  squares  of  all  the  individual  laws  (Feynman  1963). 

*  For  example  in  the  way  the  noise  thresholds  are  determined  locally  for  the  Canny 


edge  detector. 


tion  “captures”  the  spots  and  they  are  matched  correctly.  This  suggests  that 
random  dot  stereograms  are  a  limited  paradigm  and  that  although  humans 
have  the  capacity  to  match  them  correctly  they  may  not  be  the  strategy  used 
for  real  images  when  more  information  is  available. 


Moreover  the  energy  function  approach,  at  least  in  its  most  naive  form, 
tends  to  ignore  some  of  the  structure  of  the  problem.  For  example,  the  motion 
smoothing  and  segmentation  algorithm  described  in  section  3  would,  in  theory, 
be  able  to  detect  the  boundaries  of  the  leopard  but  it  would  not  be  able  to 
use  the  speed  of  the  boundary  directly  to  influence  the  internal  matching. 
There  would  certainly  be  an  indirect  influence,  since  the  motion  is  required 
to  be  smooth,  but  this  would  be  weak  and  depend  on  the  distance  from  the 
boundary.  One  can  contrast  this  with  a  more  heuristic  approach  which  would 
analyse  the  scene,  detect  the  object  boundaries,  estimate  their  velocities  and 
use  this  as  initial  data  for  matching  the  interior  of  the  object.  The  algorithm 
could  be  designed  in  terms  of  several  different  networks  connected  together 
and  certainly  individual  parts  of  this  method  could  be  implemented  by  energy 
functions.  For  example  Grzywacz  (private  communication)  has  shown  that  the 
network  described  in  section  4  will  correctly  match  the  leopard’s  spots  if  the 
estimated  motion  is  available  as  initial  data. 


The  stereo  energy  function  (5.2)  also  does  not  capture  some  of  the  im- 
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portant  features  of  flip  problem.  Places  where  the  line  processors  should  be 
on  correspond  to  boundaries  of  objects,  and  therefore  should  correspond  to 
edges  in  the  image.  Thus  it  would  be  more  sensible  to  find  these  edges  by 
simple  processing  of  the  image  rather  than  by  minimizing  (5.2). 

For  any  given  module,  such  as  stereo,  there  are  many  possible  algorithms 
some  matching  image  intensity,  others  matching  edge-like  features.  The  rela¬ 
tive  effectiveness  of  these  different  algoritluns  will  depend  on  the  images  being 
viewed,  an  edge-based  stereo  algorithm  would  be  ineffective  in  a  scene  con¬ 
taining  few  or  weak  edges.  For  some  scenes  it  would  be  natural  to  try  to  find 
and  then  match  salient  features,  such  as  the  occluding  boundaries  of  objects 
or  regions  of  high  texture  density.  Heuristics  like  coarse  to  fine  matching  ,as 
used  by  Marr  and  Poggio  (1979)  for  stereo,  could  also  be  used.  For  realistic 
images  a  stereo  system  might  have  to  use  a  number  of  different  algorithms,  or 
submodules,  interacting  with  one  another  and  combining  to  give  the  solution. 
These  submodules  might  each  be  implemented  in  terms  of  networks  minimiz¬ 
ing  energy  functions  but  the  most  important  part  of  the  calculation,  and  the 
hardest  part  of  designing  such  as  system,  would  lie  in  the  control  strategy  for 
combining  the  different  submodules. 

Thus  although  minimizing  energy  functions  is  a  useful  technique  for  early 
vision  it  has  definite  limitations.  If  the  energy  has  too  many  local  minima  it 
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may  be  better  to  try  heuristic  methods  to  avoid  them  rather  than  to  use 


complicated  search  techniques.  Minimizing  an  energy  function  is  only  one  of 


the  many  different  search  strategies  used  in  Artificial  Intelligence  research  and 


is  only  effective  for  certain  problems. 


7  Conclusion. 


We  described  how  a  number  of  problems  in  early  vision  could  be  de¬ 


scribed  in  terms  of  minimizing  energy  functions.  We  showed  that  Hopfield 


style  networks  were  able  to  give  fast  reliable  answers  for  some  of  these.  We 


discussed  the  limitations  of  this  approach. 
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Appendix 


A.l.  Hopfleld’s  Formalism 


Hopfield  describes  a  system  of  neurons  with  output's  V,  and  an  external 
input  I, ,  where  i  runs  from  1  to  n.  He  defines  an  update  rule  by 


„  dui  \ — '  „  . ,  u, 

c,~dT  =  ^T,}  j~ 


(.4.1.1) 


Here  TX]  is  a  measure  of  the  strength  of  the  connection  between  neuron 
i  and  neuron  j.  A  priori,  every  Tt]  can  be  positive,  negative  or  zero.  C,  is 
a  capacitance  and  Ri  =  1/  T,j  a  resistance  associated  with  every  neuron. 
Ui  represents  some  internal  function  of  the  neuron  i,  for  example  its  somatic 


potential,  and  is  given  by 


=  g;\v.) 


(.4.1.2) 


where  g{x)  is  a  monotonic  increasing,  but  bounded  function.  This  model 
is  deterministic,  its  final  state  will  depend  on  the  initial  conditions  and  the 


inputs  Hopfield  argues  that  it  embodies  a  content  addressable  memory. 
To  do  this  he  defines  an  energy”  function 


v'v'vV\-'vbv/vbbV>A.-,v'v'v'vv'v‘vvvb'vv"-.-s/vb>-c'vbVkSv->'vsi-b's'/ 
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E(V, i)  =  ^  jf'  9‘^V)dV  +  'L,'V' 


with  the  restrictions  T,y  =  Tji  and  Tit  —  0.  Differentiating  (A. 1.3)  using 


the  chain  rule  and  substituting  (A. 1.1)  and  (A. 1.2)  we  have 


(.4.1.4) 


which  is  always  negative.  E  is  hence  a  Lyaponov  function  and  by  standard 
results  of  Statistical  Mechanics  any  solution  to  (A.  1.1)  will  converge  to  one  of 
a  fixed  number  of  stable  points,  provided  E  is  bounded  below.  The  precise 


fixed  point  (A. 1.1)  converges  to  will  depend  on  the  initial  conditions  and  the 
external  inputs.  The  potential  of  the  system  contains  a  large  number  of  local 
minima  and  the  minima  the  system  ends  up  in  is  determined  by  the  initial 
conditions.  The  system  can  therefore  be  thought  of  as  a  content  addressable 
memory.  The  requirements  that  the  connectivity  matrix  T  is  symmetric  is 


needed  in  order  to  assure  that  the  update  of  u,  has  the  form  of  equation 


;a.i.i). 


A. 2  Extending  Hopfield’s  networks:  We  will  now  proceed  to  propose 
a  more  general  class  of  networks  and  update  rules,  The  essence  of  Hopfield’s 
networks  can  be  described  as  follows.  First  we  define  an  “energy"  function 
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E  =  E(V,). 


(.4.2.1: 


We  should  emphasize  the  quotation  marks  round  “energy”.  It  is  not 
necessarily  the  energy  of  the  physical  system  but  merely  a  Lyapunov  function. 
It  is  bounded  below.  We  will  reformulate  our  update  rule  as 


dVL_  y>  dE 


(.4.2.2) 


This  is  a  generalization  of  (A. 1.1).  In  order  for  E  to  be  a  Lyapunov 
function,  its  temporal  derivative  must  be  everywhere  negative  or  zero.  In 
other  words,  the  “energy”  must  always  decrease  or  at  most  remain  constant, 
but  never  increase.  Differentiating  (A. 2.1)  using  (A. 2.2)  yields 


dE  _  y-  1  dVidVj  y-  BE  dE 
dt  ~  ^  ')  dt  dt  ~  ^AijdVi  dV,' 

\ .  1  1.  t  J 


(.4.2.3) 


E  is  a  Lyapunov  function  (and  hence  the  system  has  a  content  addressible 
memory)  if  and  only  if  A,;  is  positive  definite,  i.e.  xT Ax  >  0  for  all  vectors  x. 
For  any  arbitrary  function  E  there  are  therefore  an  infinite  number  of  possible 
updating  rules  and  so  an  infinite  number  of  possible  systems. 


It  is  straightforward  to  check  that  Hopfields  network  defined  by  (A. 1.1), 
A. 1.2)  and  (A. 1.3)  can  be  obtained  by  setting 


gw 


•3 


A-'  = 


(A.2A) 


We  will  show  in  A. 3  that  Hopfield’s  energy  function  will  still  be  a  Lya¬ 
punov  function  if  we  set 


A~>  =  Cig-v(Vt)l,6lv 


(A.2.5) 


where  the  /,  are  an  arbitrary  set  of  positive  numbers.  This  enables  us  to 
relax  the  symmetry  constraint  on  the  Tt)  to 


T  /.  —  t  l 


(.4.2.6) 


where  there  is  no  summation  over  the  indices.  We  prove  in  the  next 
section  that  for  an  n  x  n  matrix  this  gives  us  an  additional  n  —  1  degrees  of 


freedom. 


It  is  important  to  note  that  the  number  of  constraints  is  less  important 
than  the  form  of  the  constraints.  In  general,  the  matrix  T{}  will  be  sparse, 
since  most  connections  do  not  exist.  This  becomes  especially  true  if  n  is 
large.  The  /*’ s  can  then  be  applied  to  the  remaining  non-zero  TtJ.  Hopfield’s 
symmetry  conditions  implies,  that  once  the  top  left  half  of  TtJ  is  specified, 
the  bottom  half  will  be  completely  determined,  since  Tij  =  T}i ,  independent 
of  the  number  of  zeros  in  T,,.  In  our  extension,  however,  n  —  l  of  the  non- 
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zero  entries  can  be  specified  at  will,  as  long  as  T, 3  and  X),  have  the  same 
sign.  This  condition  prohibits  the  use  local  inhibitory  interneurons,  that  is 
neurons  which  are  excited  (T,j  >  0)  but  which  inhibit  in  turn  (T,,  <  0).  If  the 
network  is  used  as  a  content  addressible  network  without  allowing  too  many 
errors  daring  “recall”,  only  6%  of  the  possible  n2  -  n  entries  are  different 
from  zero  (Hopfield,  1982).  Thus,  for  n  =  30  and  using  Hopfield’s  symmetric 
network,  only  about  26  Tjj’s  can  be  specified  while  the  remaining  oiws  are 
fixed.  Our  extension  implies,  however,  that  all  Tp  can  take  on  arbitrary 
values  —  as  long  as  the  sign  of  transposed  elements  is  the  same.  Notice  that 
these  conditions  in  no  way  constrain  the  diagonal  terms  T,,.  * 

A. 3  The  full  extension: 

Formally,  the  integrability  condition  (A. 2.1)  in  E  is  analogous  to  the  ex¬ 
istence  of  potentials,  or  state  functions,  in  Thermodynamics.  If  this  condition 
is  not  met,  then  the  value  of  E  depends  on  the  path  taken  in  Vi  space  and 
hence  cannot  be  a  Lyapunov  function. 

There  are  many  other  ways  of  constructing  Lyapunov  functions  for  a 
system  with  a  given  update  rule. 

In  order  to  study  the  class  of  connectivity  matrices  T,j  leading  to  con¬ 
verging  behaviour,  we  will  now  express  Hopfield’s  update  rule  as 


*  An  alternative  to  our  proof  is  scaling  u,  ami  V,  independently  (Hopfield,  private 
communication ). 


uVvytaM 


r 
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,‘•1 


i  (Ttv  «'(*) 


(A.3.1) 


dt  ci9-l,{v,y y  * 

By  inverting  equation  (A. 2. 2)  we  ran  define  the  appropriate  Lyapunov 


function 


9£=  rA-i^ 

av.  •> 


(A. 3.2) 


«Vf)=-  /£■ 


(A.  3.3) 


Substituting  from  (A.3.1)  gives  an  integrand  / 


(A.3.4) 


If  this  expression  is  integrable,  that  is  its  value  is  independent  of  the 
path  along  which  the  integral  was  evaluated,  then  E  is  well-defined  and  a 
Lyapunov  function.  A  function  I  is  integrable  if,  and  only  if,  dl  ==  0  where 
d  is  the  exterior  derivative  operator  (Misner,  Thorne,  Wheeler,  1977).  Define 
D,j  and  hi  by  A”1  =  C,g~u(Vi)Bl}  and  hi  =  <71-1(Vrl)/.Rl.  The  integrand  is 


/  =  J2  B,}T,kVkdV}  -  £  BijhidVj  -  Y,  BijI%dVj. 

i,j,k  i,j  i,j 


(A.3.5) 


If  13, j  and  TtJ  are  independent  of  the  V, 's  then  I  is  inferrable  provided 


(II  =  Y  Dk)Tk,dV,xdVj  -  Y  D>}~, ^-dV.xdVj  =  0.  (.4.3.0) 

L  ^  *  l 

ij 

Since  <71 ",  xd\'}  is  antisyniinetrie  in  /  and  j  this  will  hold  provided  7? tj7* 
is  symmetric  in  i  and  j.  If  h,  is  non-linear  this  will  only  be  possible 
provided  D,}  does  not  contain  off-diagonal  terms.  Thus,  with 


B,j  =  1,6, j. 


(.4.3.7) 


where  the  s  are  positive  numl>ers,  equation  (A. 3. 4)  changes  into 


Ef,M£r„.n- 

i  J  k 


R, 


(.4.3.S) 


If  l,T,j  is  symmetric,  then  (A. 3. 8)  can  be  integrated  since  it  consists  only 
of  terms  like  d(V,V})  and  g~ 1  ( l",  )d] ,  which  are  well-defined.  TIhls,  we  can 
generalize  Hopfields’s  result  to  all  matrices  TtJ.  provided  l,T,j  is  symmetric 
and  A,j  is  positive  definite.  Following  (A. 3. 7),  matrix  .4  mast  be  diagonal 
and  therefor*'  its  eigenvectors  are  equal  to  the  /,' s,  each  of  which  can  be  any 
arbitrary  positive  numbers.  Whatever  the  values  of  the  s  there  are  definite 


relations  which  must  hold  between  the  T,r  In  particular,  for  all  i,j,k,  we 


with  the  auxiliary  conditions 


T,j  ■  Tji  >  0. 


(.4.3.96) 


More  complicated  relations  can  be  deduced  but  they  can  all  be  obtained 
by  combining  relations  of  type  (A. 3. 9). 


We  now  examine  the  constraints  in  more  detail.  Suppose  we  specify  the 
values  Tij,  i  <  j  in  the  upper  right  half  of  the  matrix.  Then  the  values  in  the 
lower  left  half  are  given  by 


TJt  =  TtJj-.  (A.3.10) 

*1 

We  will  ignore  the  case  when  all  elements  in  column  j  are  zero  (and 
therefore  also  all  elements  in  row  j),  since  the  order  of  the  matrix  will  simply 
be  reduced  by  one.  Otherwise  the  lower  left  half  of  the  matrix  is  determined 
by  the  l,/l}.  Now  all  the  l,/lj  can  be  determined  from  a  basic  n  —  1  elements 


h  h  ln-l 


(.4.3.11) 


dE  v-  dE  dV, 
dt  ^  dv.  dt 

j  1 


(.4.3.12) 


for  all  motions.  Requiting  the  update  rule  to  he  related  to  E  by  (A. 3. 2) 
will  ensure  (A.3.G).  There  are  many  other  ways  of  enforcing  (A .3 .12).  For 
Hopfield’s  updating  rule,  however,  we  have  been  unable  to  Hud  any  other 
iutegrable  Lyapunov  function. 
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